Goto

Collaborating Authors

 continuous speech recognition


Continuous Speech Recognition by Linked Predictive Neural Networks

Neural Information Processing Systems

We present a large vocabulary, continuous speech recognition system based on Linked Predictive Neural Networks (LPNN's). The system uses neu(cid:173) ral networks as predictors of speech frames, yielding distortion measures which are used by the One Stage DTW algorithm to perform continuous speech recognition. The system, already deployed in a Speech to Speech Translation system, currently achieves 95%, 58%, and 39% word accuracy on tasks with perplexity 5, 111, and 402 respectively, outperforming sev(cid:173) eral simple HMMs that we tested. We also found that the accuracy and speed of the LPNN can be slightly improved by the judicious use of hidden control inputs. We conclude by discussing the strengths and weaknesses of the predictive approach.


Multi-State Time Delay Networks for Continuous Speech Recognition

Neural Information Processing Systems

We present the "Multi-State Time Delay Neural Network" (MS-TDNN) as an extension of the TDNN to robust word recognition. The resulting system has the ability to manage the sequential order of subword units. In this paper we present extensive new evaluations of this approach over speaker-dependent and speaker-indepen(cid:173) dent connected alphabet.


Segmental Neural Net Optimization for Continuous Speech Recognition

Neural Information Processing Systems

Previously, we had developed the concept of a Segmental Neural Net (SNN) for phonetic modeling in continuous speech recognition (CSR). This kind of neu(cid:173) ral network technology advanced the state-of-the-art of large-vocabulary CSR, which employs Hidden Marlcov Models (HMM), for the ARPA 1oo0-word Re(cid:173) source Management corpus. More Recently, we started porting the neural net system to a larger, more challenging corpus - the ARPA 20,Ooo-word Wall Street Journal (WSJ) corpus. During the porting, we explored the following research directions to refine the system: i) training context-dependent models with a reg(cid:173) ularization method; ii) training SNN with projection pursuit; and ii) combining different models into a hybrid system. When tested on both a development set and an independent test set, the resulting neural net system alone yielded a per(cid:173) fonnance at the level of the HMM system, and the hybrid SNN/HMM system achieved a consistent 10-15% word error reduction over the HMM system.


Hierarchical Mixtures of Experts Methodology Applied to Continuous Speech Recognition

Neural Information Processing Systems

In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan [1], into an HMM(cid:173) based continuous speech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each "expert" or neural net performs a "soft" decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,OOO-word and 4O,OOO-word Wall Street Journal corpus using HME models.


Minimum Bayes Error Feature Selection for Continuous Speech Recognition

Neural Information Processing Systems

We consider the problem of designing a linear transformation () E lRPx n, of rank p n, which projects the features of a classifier x E lRn onto y ()x E lRP such as to achieve minimum Bayes error (or probabil(cid:173) ity of misclassification). Two avenues will be explored: the first is to maximize the ()-average divergence between the class densities and the second is to minimize the union Bhattacharyya bound in the range of (). While both approaches yield similar performance in practice, they out(cid:173) perform standard LDA features and show a 10% relative improvement in the word error rate over state-of-the-art cepstral features on a large vocabulary telephony speech recognition task.


Fusayasu

AAAI Conferences

In spite of the recent advancements being made in speech recognition, recognition errors are unavoidable in continuous speech recognition. In this paper, we focus on a word-error correction system for continuous speech recognition using confusion networks.Conventional N-gram correction is widely used; however, the performance degrades due to the fact that the N-gram approach cannot measure information between long distance words. In order to improve the performance of the N-gram model, we employ Normalized Relevance Distance (NRD) as a measure for semantic similarity between words. NRD can identify not only co-occurrence but also the correlation of importance of the terms in documents. Even if the words are located far from each other, NRD can estimate the semantic similarity between the words. The effectiveness of our method was evaluated in continuous speech recognition tasks for multiple test speakers. Experimental results show that our error-correction method is the most effective approach as compared to the methods using other features.


Continuous Silent Speech Recognition using EEG

Krishna, Gautam, Tran, Co, Carnahan, Mason, Tewfik, Ahmed

arXiv.org Machine Learning

In this paper we explore continuous silent speech recognition using electroencephalography (EEG) signals. We implemented a connectionist temporal classification (CTC) automatic speech recognition (ASR) model to translate EEG signals recorded in parallel while subjects were reading English sentences in their mind without producing any voice to text. Our results demonstrate the feasibility of using EEG signals for performing continuous silent speech recognition. We demonstrate our results for a limited English vocabulary consisting of 30 unique sentences.


EEG based Continuous Speech Recognition using Transformers

Krishna, Gautam, Tran, Co, Carnahan, Mason, Tewfik, Ahmed H

arXiv.org Machine Learning

--In this paper we investigate continuous speech recognition using electroencephalography (EEG) features using recently introduced end-to-end transformer based automatic speech recognition (ASR) model. Our results show that transformer based model demonstrate faster inference and training compared to recurrent neural network (RNN) based sequence-to-sequence EEG models but performance of the RNN based models were better than transformer based model during test time on a limited English vocabulary. Continuous speech recognition using non invasive brain signals or electroencephalography (EEG) signals is an emerging area of research where non invasive EEG signals recorded from the scalp of the subject is translated to text. EEG based continuous speech recognition technology enables people with speaking disabilities or people who are not able to speak to have better technology accessibility. Current state-of-the-art voice assistant systems process mainly acoustic input features limiting technology accessibility for people with speaking disabilities or people with no ability to produce voice.


Continuous Speech Recognition using EEG and Video

Krishna, Gautam, Carnahan, Mason, Tran, Co, Tewfik, Ahmed H

arXiv.org Machine Learning

--In this paper we investigate whether electroen-cephalography (EEG) features can be used to improve the performance of continuous visual speech recognition systems. We implemented a connectionist temporal classification (CTC) based end-to-end automatic speech recognition (ASR) model for performing recognition. Our results demonstrate that EEG features are helpful in enhancing the performance of continuous visual speech recognition systems. In recent years there has been lot of interesting work done in the fields of lip reading and audio visual speech recognition. In [1] authors demonstrated end-to-end sentence level lip reading and in [2] authors demonstrated deep learning based end-to- end audio visual speech recognition.


Artificial Intelligence Programming in Java

#artificialintelligence

There is a list of programming languages are available for developing an artificial intelligence project such as Python, POP-11, C, MATLAB, Java, Lisp, and Wolfram language. In this article, you find How Java programming works with Artificial Intelligence. The main feature of Java is Java virtual machine. Java virtual machine is an abstract machine and is available in many hardware and software platform. Java virtual machine performs an operation like loads code, verifies code, provide a runtime environment, and executes code.